Crosscorrelation-based multispeaker speech activity detection

نویسندگان

  • Kornel Laskowski
  • Qin Jin
  • Tanja Schultz
چکیده

We propose an algorithm for segmenting multispeaker meeting audio, recorded with personal channel microphones, into speech and non-speech intervals for each microphone’s wearer. An algorithm of this type turns out to be necessary prior to subsequent audio processing because, in spite of close-talking microphones, the channels exhibit a high degree of crosstalk due to unbalanced calibration and small inter-speaker distance. The proposed algorithm is based on the short-time crosscorrelation of all channel pairs. It requires no prior training and executes in one fifth real time on modern architectures. Using meeting audio collected at several sites, we present error rates for the segmentation task which do not appear correlated with microphone type or number of speakers. We also present the resulting improvement in speech recognition accuracy when segmentation is provided by this algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multispeaker Speech Activity Detection for the Icsi Meeting Recorder

As part of a project into speech recognition in meeting environments, we have collected a corpus of multi-channel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in...

متن کامل

Improved speech activity detection using cross-channel features for recognition of multiparty meetings

We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena such as crosstalk. Results demonstrate...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Separation of Multispeaker Speech Using Excitation Information

In this paper, we propose an approach for separating speech of individual speakers from a multispeaker speech signal using excitation source information. The proposed approach is demonstrated in a two-microphone case. The main issue in the two-microphone case is the estimation of delay of each speaker. We propose a method for delay estimation in multispeaker case using the knowledge of excitati...

متن کامل

Hidden Markov Model Based Speech Activity Detection for the ICSI Meeting Project

As part of a project into speech recognition in meeting environments, we have collected a corpus of multi-channel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004